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ABSTRACT 

Aim: The aim of this study was to search for metabolic biomarkers of Crohn's disease (CD). 

Background: Crohn's disease (CD) is a type of inflammatory bowel disease that causing a wide variety of symptoms. 
CD can influence any part of the gastrointestinal tract from mouth to anus. CD is not easily diagnosed because 
monitoring tools are currently insufficient. Thus, the discovery of proper methods is needed for early diagnosis of CD. 
Patients and methods: We utilized metabolic profiling using proton nuclear magnetic resonance spectroscopy 
('HNMR) to find the metabolites in serum. Classification of CD and healthy subject was done using partial least squares 
discriminant analysis (PLS-DA). 

Results: According to PLS-DA model, we concluded that just using one descriptor CD and control groups could be 
classified separately. The level of lipid in blood serum of CD compared to healthy cohorts was decreased. For the 
external test set, the classification model showed a 94% correct classification of CD and healthy subject. 
Conclusion: The result of classification model presents that NMR based metabonomics is key tool as well as insight into 
potential targets for disease therapy and prevention. 
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Introduction 

Crohn's disease (CD) is a type of inflammatory 
bowel disease (IBD) that is found in the last part of 
the small intestine and the first part of the large 
intestine (1-3). Both the host genotype and 
environmental factors play a role in etiology of CD. 
Also presence of bacteria needs for disease 
induction. 
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In order to exact diagnosis of CD, several 
serological biomarkers have been proposed. But 
they are used in conjunction and as a supplement 
to endoscopy. Thus metabonomics as a monitoring 
tool are needed for early diagnosis (4). 
Metabonomics is defined as "the quantitative 
measurement of the dynamic multi-parametric 
response of living systems to patho physiological 
stimuli or genetic modification" (5). One of the 
most commonly applied for metabonomics is 
proton nuclear magnetic resonance ('H NMR). 
This analytical technology has several advantages 
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as follows: It provides quantitative and 
reproducible information with little sample 
preparation, and hence it is widely used to build 
metabolic profiles in diverse metabolic studies. 
'HNMR spectroscopy of biofluid and tissue 
samples has been applied to the investigation and 
diagnosis of many diseases of gastrointestinal (6-8). 
So far several study has investigated metabolite 
profiles in serum of patients with CD in the 
literature (9). In the study by Schicho et al. 
metabolites in serum and plasma of IBD subjects 
were analysed. They obtained regular one- 
dimensional proton NMR spectra using a standard 
pulse sequence (Bruker pulse program prnoesyld) 
(9). In other survey Fathi et al. applied 
classification and regression tree (CART) to 
explore the metabolic biomarkers causes of CD 
compare to control group (10). 
In this study, to determine the metabolites, we 
used 'll NMR spectroscopy and performed 
quantitative analysis of metabolites in the serum 
of patients with active CD. PLS-DA was 
employed as a powerful classification method. 
Our aim was to search for metabolic biomarkers of 
CD to classify the control and CD groups. 

Patients and Methods 

Sample collection 

Twenty-six adult patients with mean age (± 
standard deviation) of 33.6 ± 11.3 years 
diagnosed with Crohn's disease and twenty-nine 
healthy subjects with mean age (± standard 
deviation) 34.7±12.2 years were recruited from 
Gastroenterology and Liver Disease Research 
Center, Shahid Beheshti University of Medical 
Sciences. To avoid the affect of aging and 
gender influence on metabonomics, the healthy 
subject were matched with CD subjects (9). 
Experienced gastroenterologists made diagnosis 
of CD on the basis of radiographic, clinical 
findings and often colonoscopy criteria. Both 
CD and healthy cohort who entered to study had 



not significant other past medical history 
including hypertension, diabetes mellitus or 
hyperlipidemia. Serum samples were collected 
in the morning after a 12 hour fast and store at 
-70 °C till measurement. 'HNMR spectroscopy 
and data preprocessing were thoroughly 
explained in our previous study (11). 

Statistical analysis 

Partial least squares discriminate analysis 
(PLS-DA) was employed using PLS Toolbox 
method (Version 2, Eigenvector Research Inc., 
Manson, WA) within MATLAB (version 6.5.1, 
The Mathworks, Cambridge, U.K.). This method 
is a regression technique adapted to a supervised 
classification task. Thus it is a frequently used 
for classification method. It is on the basis of 
the partial least squares (PLS) approach (12). 
The standard PLS algorithm can be utilized and class 
labels can be used for the dependent Y vector. 
Usually in the two-class case, the values of the Y are 
given 1 for one class and 0 or -1 for the other class. 
Using this supervised analysis technique; we can 
identify those metabolites which show a discrepancy 
between diagnostic groups (13). In this study, 
'HNMR data and class labels were used as x matrix 
and y matrix respectively. The dataset was divided 
into two parts training set and test set. Training set 
was used to build a model and identify the most 
relevant metabolites and in order test predictive 
ability of the classification model, test set was 
employed. 

Results 

In classification part, PLS-DA was used to 
classify CD and healthy samples. Based on a 
randomly choice, approximately 30% of samples are 
left out to form the test set. Consequently, the 
training set was included of 39 'H NMR spectra and 
the test set was composed of 1 6 spectra. By means of 
this procedure we can reduce the risk of over-fitting 
and avoid any possibility that the best classification 
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models selected had a chance correlation to 
peculiarities in the capacity of the test set(10). Figure 
1 and Figure 2 present the score plots of the related 
PLS-DA of the ! H NMR spectra of the serum for 
training and test sets respectively. 
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Figure 1. PLS-DA sores plot of serum dataset: training set 
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Figure 2. PLS-DA sores plot of serum dataset: test set 



In Figures 1 and 2, the first two latent variables 
LV1 versus LV2 map the representative points of the 
serum samples in the space were spanned. Also 
scores plot was elucidating a logical clustering 
appearing according to class (CD and healthy 
subject). To explore exactly which metabolites have 
caused the separation between CD and healthy 
subjects, the loadings plot of PLS-DA model is 



depicted in Figure 3. Table 1 reveals that the 
metabolite was the most prominent in serum 
separation with P-value < 0.00001. 




Figure 3. PLS-DA the corresponding loadings plot: 
training set 

A confusion matrix (17) includes knowledge 
about the number of correct and incorrect predictions 
compared to the real outcomes by a classification 
model. Performance of such systems is commonly 
evaluated using the data in the matrix. The Table 2 
shows the confusion matrix for a two class classifier. 
As it is clear from Table 2, PLS-DA model has an 
accuracy of 0.94 in detecting CD patients for 
external test set. 

Table 1. Specifications of the selected PLS-DA descriptor 



Descriptors 


Assignment 


'H chemical shift (ppm) 


Lipid 


CH 3 CH 2 (CH 2 ) n 


1.26 



Table 2. Confusion matrix for training and test set 



set 


Observation 


CD class 


Healthy 
class 


Training 


CD class 


18 


0 




Healthy class 


0 


21 


Test 


CD class 


7 


1 




Healthy class 


0 


8 



A summary of the classification parameters are 
shown in Table 3. These results show that PLS-DA 
classification model has great chance in the 
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diagnosis of CD. Area under ROC curve is often 
used as a measure of quality of the classification 
models. A random classifier has an area under the 
curve of 0.5, while AUC for a perfect classifier is 
equal to 1. In practice, most of the classification 
models have an AUC between 0.5 and 1. The 
obtained values of AUC for training and test set are 1 
and 0.94, respectively. The high AUC score of the 
proposed model for the samples in the external test 
set is another evidence for capability of PLS-DA 
model in CD detection. 

Table 3. The calculated error and non-error rates of the 
classification index and the classification performances of 
training and test sets 

Error Non- specificity sensitivity accuracy 
rate error 

rate 

Training Oil 1 1 

set 

Test set 0.06 0.94 0.89 1 0.94 



Discussion 

Investigation of the selected variable revealed 
that the selected chemical shift correspond to the 
NMR spectrum of lipid (14-16). Consequently it 
can be stated that the discrimination of CD and 
control samples by the PLSDA model, based on 
NMR data, is on the basis of different amounts of 
lipid in the two groups. This result shows the 
reduction of lipid level in blood serum of patients 
compared to healthy individuals. Using PLS DA 
method, the most important metabolite lipid was 
identified. The results of Kuroki and Hrabovsky et 
al. for lipid level in CD are similar to our study. 
They found that the serum levels of total lipids 
and total cholesterol were reduced in patients with 
CD (18, 19). 

Lipids are molecules that include fat-soluble 
vitamins (such as vitamins A, D, E and K), 
monoglycerides, diglycerides, phospholipids, and 
others. They are necessary elements which acting 
as structural components of cell membranes. Main 



energy supply in cells and tissues is role of lipids. 
Also it has been proved that the serum vitamin E 
levels associates almost with serum lipid levels 
(20, 21). In the study by Fernandez-Banares et al. 
and Kuroki et al. (22, 23), serum concentrations of 
vitamin E have also been found to be lower in CD 
patient. 

Fernandez-Banares et al. stated that the 
pathophysiological and clinical implications of the 
suboptimal vitamin status observed in acute CD 
are unknown. Kuroki and colleagues suggested 
that there is a variety of vitamin deficiencies in 
CD prior to treatment that may reflect the severity 
of the disease (23). 

Vitamin E is as a fat-soluble antioxidant; it can 
be effective at scavenge free radicals at the 
cellular level and can be also prevented a great 
deal of the resultant scarring during CD. The 
result of inflammation in the gut mucosa and 
decreased oral intake can be cause risk for vitamin 
and mineral deficiencies in CD. Probably the 
reduce of lipid levels may relate to the remission 
of body level of Vitamin E. Classification model 
prediction procedure permits demonstrating that 
the classification obtained by the PLS-DA 
technique is good enough to execute classification 
of unknown samples. 

In the present work, a 'H NMR based 
metabonomics approach was gave an evidence for 
the existence of clear metabolic differentiation 
between two groups (CD and control group). 

Since 'H NMR based metabonomics is 
effective to monitor the progression of disease, 
and helpful to discover biomarkers of CD, we can 
suggest that, NMR based on metabonomics can 
provide the possibility for assisting in early of CD. 
Therefore, further investigations are required to 
establish its real usefulness in clinical practice. 
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